Journal of Virology JVI Journal of Virology Accepted ManUSCrTpFPoSted Onlme 


JVI Accepted Manuscript Posted Online 11 January 2017 
J. Virol, doi:10.1128/JVI.01953-16 

Copyright © 2017 American Society for Microbiology. All Rights Reserved. 


1 Title: 

2 Surveillance of bat coronaviruses in Kenya identifies relatives of human coronaviruses NL63 

3 and 229E and their recombination history 

4 

5 Running title: 

6 Bat origin of human coronaviruses 

7 

8 Authors: 

9 Ying Tao 1# , Mang Shi 2# , Christina Chommanard 1 , Krista Queen 1 , Jing Zhang 1 , Wanda 

10 Markotter 3 , Ivan V. Kuzmin 4 '!', Edward C. Holmes 2 , Suxiang Tong 1 

11 

12 Affiliations: 

13 1 Division of Viral Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30333, 

14 USA; 2 Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, 

15 School of Life and Environmental Sciences and Sydney Medical School, The University of 

16 Sydney, Sydney, Australia; 3 Centre for Viral Zoonoses, Department of Medical Virology, 

17 Faculty of Health Sciences, University of Pretoria, Pretoria, South Africa; 4 Division of High 

18 Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, 

19 GA 30333, USA. 

20 # Y.T. and M.S. contributed equally to this work 

21 f Present address: Department of Pathology, University of Texas Medical Branch, Galveston, 

22 TX 77555, USA. 

23 * Correspondence to: Dr. Suxiang Tong, 11600 Clifton Rd, mail stop G18, CDC, 

24 Atlanta, GA 30333; Tel: 4046391372; Email: sotl@cdc.gov. 

25 The findings and conclusions in this report are those of the author(s) and do not necessarily 

26 represent the official position of the Centers for Disease Control and Prevention. 

27 

28 Type of Publication: ‘Full length’ paper 

29 Word count: Abstract (155); Importance (106); Text body (4618) 

1 


Downloaded from http://jvi.asm.org/ on January 11,2017 by UNIV OF CALIF SAN DIEGO 


cfWology JVI Journal of Virology Accepted Mdfl USCH pf Posted Oflllhe 


30 ABSTRACT 

31 Bats harbor a large diversity of coronaviruses (CoVs), several of which are related to 

32 zoonotic pathogens that cause severe disease in humans. Our screening of bat samples 

33 collected in Kenya during 2007-2010 not only detected RNA from several novel CoVs but, 

34 more significantly, identified sequences that were closely related to human Co Vs NL63 and 

35 229E, suggesting that these two human viruses originate from bats. We also demonstrated 

36 that human CoV NL63 is a recombinant between NL63-like viruses circulating in Triaenops 

37 bats and 229E-like viruses circulating in Hipposideros bats, with the break-point located near 

38 5’ and 3’ end of the spike (S) protein gene. In addition, two further inter-species 

39 recombination events involving the S gene were identified, suggesting that this region may 

40 represent a recombination “hotspot” in CoV genomes. Finally, using a combination of 

41 phylogenetic and distance-based approaches we showed that genetic diversity of bat Co Vs is 

42 primarily structured by host species and subsequently by geographic distances. 

43 

44 IMPORTANCE 

45 Understanding the driving forces of cross-species virus transmission is central to 

46 understanding the nature of disease emergence. Previous studies have demonstrated that bats 

47 are the ultimate reservoir hosts for a number of coronaviruses (CoVs) including ancestors of 

48 SARS-CoV, MERS-CoV, and ElCoV-229E. However, the evolutionary pathways of bat 

49 CoVs remain elusive. We provide evidence for natural recombination between distantly- 

50 related African bat coronaviruses associated with Triaenops afer and Hipposideros sp. bats 

51 that resulted in a NL-63 like virus, an ancestor of the human pathogen HCoV-NL63. These 

52 results suggest that inter-species recombination may play an important role in CoV evolution 

53 and the emergence of novel CoVs with zoonotic potential. 
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54 INTRODUCTION 

55 Coronaviruses (CoVs) (subfamily Coronavirinae, family Coronaviridae, order Nidovirales) 

56 are common infectious agents that infect a wide range of hosts including humans, causing 

57 respiratory, gastrointestinal, liver, and neurologic diseases, and that possess the largest 

58 genomes of any RNA viruses described to date (1). The subfamily Coronavirinae is currently 

59 classified into four genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and 

60 Deltacoronavirus (2). The alphacoronaviruses (alpha-CoV) and betacoronaviruses (beta-CoV) 

61 are exclusively found in mammals while the gammacoronaviruses (gamma-CoV) and 

62 deltacoronaviruses (delta-CoV) are mainly associated with birds. Presently, the greatest 

63 diversity of alpha- and beta-CoVs has been documented in bats, which in part reflects the 

64 more intensive surveillance of these animals since Rhinolophus spp. bats were implicated as 

65 the reservoir hosts for SARS-related CoVs (3, 4). This surveillance resulted in the discovery 

66 of a potential reservoir host (bat) species for another two human CoVs: Human CoV 229E 

67 (HCoV-229E), a relative of which is present in Hipposideros bats (5, 6), and Middle East 

68 respiratory syndrome coronavirus (MERS-CoV), for which related viruses are present in 

69 Pipistrellus, Tylonycteris, and Neoromicia bats (7-10), although the most likely reservoir host 

70 of human MERS-CoV identified to date is the dromedary camel (11). Most recently HCoV- 

71 229E-like CoVs were also identified in camels, although their role in human infection is 

72 unknown (12). 

73 Africa is a major hotspot of zoonotic emerging diseases. With its rich biodiversity, 

74 Africa is inhabited by many bats of different species including those that serve as reservoirs 

75 of important zoonotic diseases such as Marburg hemorrhagic fever and rabies (13). Our initial 

76 screening demonstrated the presence of diverse CoVs in African bats, including those 

77 collected in the southern parts of Kenya during 2006 (14, 15), and in other countries 

78 including South Africa, Nigeria, and Ghana (16). Furthermore, recent studies have provided 
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79 strong evidence that HCoV-229E originated from bat viruses circulating in Africa (5), 

80 underscoring the zoonotic potential of bat-borne CoVs from this continent. 

81 One human coronavirus, HCoV-NL63, was first isolated in 2004 from the aspirate of 

82 a 8-month-old boy suffering from pneumonia in the Netherlands (17). While the clinical 

83 significance of this virus is debated, it has a worldwide distribution and is known to infect 

84 both the upper and lower respiratory tract (18). Based on a phylogeny of the RNA-dependent 

85 RNA polymerase (RdRp), HCoV-NL63 is related to another human virus HCoV-229E and 

86 had no close relatives identified in bats (16). Although Huynh et al. (19) suggested that a 

87 virus (ARCoV.2/2010/USA) isolated from the American tricolored bat ( Perimyotis subflavus) 

88 may share common ancestry with HCoV-NL63, the genetic distance between the two viruses 

89 is large, and their close relationship has not been corroborated in other phylogenetic analyses 

90 (16, 20). Nevertheless, the successful passage of HCoV-NL63 in an immortalized bat cell 

91 line suggests its potential association with bats (19). 

92 As is well appreciated, recombination leads to rapid changes of genetic diversity in 

93 RNA viruses (21). CoVs represent a classic example of viruses with high frequencies of 

94 homologous recombination through discontinuous RNA synthesis (22). Indeed, under 

95 experimental conditions, the recombination frequency can be as high as 25% for the entire 

96 CoV genome (23). Recombination in CoVs is also frequently reported under natural 

97 conditions, including some emerging human pathogens such as SARS-CoV (24, 25), MERS- 

98 CoV (11), HCoV-OC43 (26), and HCoV-NL63 (27), although most reports are between 

99 closely related viruses. 

100 The Global Disease Detection Program (GDD) of the Centers for Disease Control and 

101 Prevention (CDC, Atlanta, GA) is focused on the detection of emerging infectious agents 

102 worldwide. One of the GDD projects was directed toward the detection of such potential 

103 zoonotic pathogens in African bats. Since the initial study performed during 2006 in Kenya 
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104 (14, 15), an expanded surveillance of bat CoVs has been performed in the same and other 

105 countries including Kenya, Nigeria, Democratic Republic of Georgia, Democratic Republic 

106 of Congo, Guatemala, and Peru. The project included more bat species and geographic 

107 locations, allowing a more thorough investigation of the genetic diversity and ecological 

108 dynamics of CoVs circulation in bats. In this study, we performed an ecological and 

109 evolutionary characterization of CoVs circulating in Kenya and identified distinct CoVs from 

110 Triaenops afer and Hipposideros sp. bats that are phylogenetically related to HCoV-NL63 in 

111 different parts of the genome. Based on this data, we propose a scenario for the origin and 

112 evolutionary history of HCoV-NL63 and related viruses. 

113 
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114 MATERIALS AND METHODS 

115 Sample collection. Between 2007 and 2010 a total of 2050 bat specimens were collected 

116 from 30 different locations in Kenya (Table SI) in collaboration with the CDC GDD regional 

117 country office in Kenya and National Museums of Kenya. The bats were captured using mist- 

118 nets, hand nets or manually. The protocol (2096FRAMULX-A3) was approved by the CDC 

119 IACUC and by Kenya Wildlife Services. Upon capture, each bat was measured, sexed and 

120 identified to species by a trained field biologist. Subsequently, fecal and oral swabs (if 

121 possible) were collected in compliance with field protocol and were then transported on dry 

122 ice from the field to -80°C storage before further processing. 

123 

124 CoV RNA detection. Each fecal and oral swab was suspended in 200 pL of a phosphate 

125 buffered saline. Viral total nucleic acids (TNA) were extracted using the QIAamp Mini Viral 

126 Spin kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions, followed 

127 by semi-nested RT-PCR (Superscript III One-Step RT-PCR kit and Platinum Taq kit, 

128 Invitrogen, San Diego, CA, USA) using primer sets designed to target the conserved genome 

129 region of alpha-, beta-, gamma- and delta-CoVs, respectively (15). PCR products of the 

130 expected size (~ 400 nucleotides) were purified by gel extraction using the QIAquick Gel 

131 Extraction kit (Qiagen, Valencia, CA, USA) and sequenced in both directions on an ABI 

132 Prism 3130 automated sequencer (Applied Biosystems, Foster City, CA, USA). As 

133 validation, the RT-PCR procedure was repeated for each of the CoV positive specimens. 

134 

135 Bat mitochondrial gene sequencing. Bat species were further confirmed by sequencing the 

136 host mitochondrial cytochrome b (cytB) gene in each of the CoV-positive specimens. Both 

137 the method and the primers used have been described previously, and a final 1104 bp 

138 fragment of the cytB gene was amplified and sequenced as described previously (14, 15). 
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139 

140 Phylogenetic analyses. This study generated a total of 240 CoV RdRP sequences (402 bp) 

141 from Kenyan bats. These sequences were first aligned in MAFFT v7.013 (28), using amino 

142 acid sequences as a guide for the nucleotide sequence alignment. Phylogenetic trees were 

143 then inferred using the maximum likelihood (ML) method available in PhyML version 3.0 

144 (29) assuming a general time-reversible (GTR) model with a discrete gamma distributed rate 

145 variation among sites (T 4 ) and the SPR branch-swapping algorithm. To produce a more 

146 condensed data set, we clustered the highly similar sequences from the same geographic 

147 location and host species, and randomly chose one or two to represent each cluster. This 

148 condensed data set was subsequently combined with 121 reference sequences representative 

149 of the genetic diversity of alpha- and beta-CoVs on a global scale taken from GenBank. ML 

150 phylogenetic trees of these final alignments were inferred using the same procedure and 

151 substitution models as described above. 

152 

153 Comparisons of viral genetic, geographic, and host genetic distance matrices. To 

154 determine the relationship between viral genetic, geographic, and host genetic distances, we 

155 compiled a data set containing the Kenyan CoV samples generated in this study. The genetic 

156 distance matrices were produced from pairwise comparisons either in the form of uncorrected 

157 percentage differences or calculated from the phylogenetic trees (patristic distance) using the 

158 Patristic vl.O program (30) The geographic distances (Euclidean distance) were calculated 

159 using the formula “distance = (acos((sin(latitudel) * sin(latitude2)) + (cos(latitudel) * 

160 cos(latitude2) * cos(longitude2 - longitude 1)))) * 6371”, with spatial coordinates of the 

161 samples derived from the geographic location information. 

162 We used Mantel correlation analyses to test the extent of the correlation between 

163 these matrices (31). Both simple Mantel’s test and partial Mantel’s test were performed, and 
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164 the correlation was evaluated with 10000 permutations. To access which of the two factors - 

165 geographic or host genetic distance - best explained total variation in the virus genetic 

166 distance matrices, we performed multiple linear regression on these distance matrices (32). 

167 The statistical significance of each regression was evaluated by performing 10000 

168 permutations. To examine whether the degree of virus genetic relatedness corresponded to 

169 the scale of geographic distance or host relatedness, we generated Mantel correlograms. In 

170 each correlogram, 10-12 distance classes were assigned following an equal-frequency 

171 criterion: each class had similar number of pairwise comparisons. All statistical analyses 

172 were performed using the Ecodist package implemented in R3.0.2 (33), and all statistical 

173 results were considered significant at the P = 0.05 level. 

174 

175 Full genome sequencing and sequence analyses. Five viruses representative of the full 

176 diversity of the CoVs newly described here were selected for full genome sequencing: 

177 BtKY229E-l, BtKY229E-8, BtKYNL63-9a, BtKYNL63-9b, and BtKYNL63-15. We first 

178 sequenced a number of conserved regions throughout the genome using several semi-nested 

179 or nested consensus degenerate RT-PCR amplicons. These regions were then bridged using 

180 sequence-specific RT-PCR followed by Sanger sequencing (< 2 kb) or using the PacBio 

181 platform (> 2 kb). The assembled consensus genome sequences from PacBio sequencing 

182 were later confirmed by sequence-specific RT-PCR and Sanger sequencing (GenBank 

183 accession numbers KY073744-KY073748). The 5’ and 3’ genome termini were not 

184 determined due to the limited RNA remaining, and were derived with PCR primers based on 

185 the conserved genome regions in alpha-CoVs. 

186 For each complete genome sequence, potential ORFs were predicted based on the 

187 conserved core sequence, 5'-CUAAAC-3', with a minimum length of 66 amino acids. 

188 Ribosomal frameshifts were identified based on the presence of the conserved slippery 
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189 sequence, “UUUAAAC”. For phylogenetic analyses, the data set was first separated into six 

190 ORFs, namely; ORFla, ORFlb, Spike (S), Envelope (E), Membrane (M), and Nucleoprotein 

191 (N) genes. The data set for each gene was translated into amino acid sequences and aligned 

192 using MAFFT v7.013. Phylogenetic trees were then inferred using PhyML as described 

193 above. Recombination events were first identified from the occurrence of incongruent 

194 topologies in these initial phylogenies, and were then confirmed and characterized using 

195 Simplot v3.5.1 (34). In the Simplot analysis, seven sequences were analyzed, including the 

196 potential recombinant, the parental viruses, as well as an outgroup. The similarity 

197 comparisons of recombinant and the other sequences were plotted using a sliding window 

198 with a size of 1000 bp and a step size of 10 bp. 

199 

200 RESULTS 

201 Prevalence of CoV in Kenyan bats. We examined bats from at least 27 species (17 genera) 

202 collected over a four year period (2007-2010) from 30 locations across the southern part of 

203 Kenya (Figure 1). A total of 2,050 bats samples were screened for CoV RNA using a pan- 

204 coronavirus RT-PCR assay. We found an overall prevalence of 11.7% (240/2,050 bats) 

205 (Table SI). This overall prevalence is in line with recent reports of CoVs in bats from 

206 numerous locations including South Africa, Mexico, Philippines, Kenya, United Kingdom, 

207 Japan, Italy, and Ghana (6, 14, 15, 35-40). 

208 Bats of the species tested ( Chaerephon pumilus, Coleura afra, Lissonycteris 

209 angolensis, Miniopterus africanus, Neoromicia tenuipinnis, Neoromicia sp., Nycteris sp., 

210 Pipistrellus sp., and Scotoecus sp.) did not yield CoV positive samples although the sample 

211 number was limited and might not reflect the real prevalence (Table SI). Conversely, in bats 

212 of several other species the CoV prevalence was high ( Cardioderma cor, 25%; Eidolon 

213 helvum, 21%; Epomophorus labiatus, 28.6%; Hipposideros sp., 27.6%; Miniopterus minor, 
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214 22.6%; Otomops martiensseni, 28.6%; Rhinolophus hildebrandtii, 31.3%; Rhinolophus sp., 

215 28.9%; Triaenops afer, 26.7%). Most species (21/27) were sampled at more than one 

216 location. Of note, we detected Co Vs in 21% of E. helvum bats tested in Kenya, whereas a 

217 previous study in Ghana failed to detect any CoVs in a similar number of bats from this 

218 species (6). 

219 

220 Phylogenetic diversity of Kenyan bat CoVs. The viral sequences identified in Kenyan bats 

221 showed a remarkable diversity within both alpha- and beta-CoVs (Figure 2). Based on our 

222 phylogenetic analysis, the CoVs newly identified here can be grouped into 20 phylogenetic 

223 lineages (Figure 2). Many of the sampled bat genera are associated with more than one viral 

224 lineage. Furthermore, in some cases, the divergence of the CoVs within the same host genera 

225 may also be associated with possible differences in sample types. For example, we found two 

226 lineages of CoV in Rousettus aegyptiacus bats, one of which was present in oral swabs 

227 (Figure 2: L7 Rousettus ) while the other one was identified in fecal swabs (L17 Rousettus). 

228 The default tissue tropism for bat CoVs is believed to be intestinal and samples of choice are 

229 fecal swabs. In agreement with this, only four viruses were identified from oral swab samples 

230 (L7 Rousettus ) as indicated in the phylogeny (Figure 2). 

231 Our phylogenetic analyses also revealed a number of cross-species transmission 

232 events at the genus level, many of which appeared to be transient spill-overs with no evidence 

233 of onward transmission. This pattern was observed as CoV sequences recovered from bats of 

234 a particular genus located as tree tips within the phylogenetic diversity that is mainly 

235 associated with a different bat genus. From our Kenyan data set, there were seven such cross- 

236 species transmission events in total, each represented by a single sequence (dotted red in 

237 Figure 2), suggesting these are most likely viruses with limited transmission within new hosts, 

238 although this hypothesis requires confirmation on a larger set of samples. 
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239 A more comprehensive and informative phylogeny (Figure 3) was obtained after 

240 including the representative global CoV sequences from GenBank, which also included the 

241 Kenyan viruses previously reported (15). The phylogeny, which included viral sequences 

242 recovered from bats of more than 50 species (30 genera), resulted in an accurate phylogenetic 

243 assignment of the viruses described in this study (Figure 3). Importantly, the newly 

244 discovered viruses from Kenya have greatly extended our previous work (15) in terms of: (i) 

245 expanding the diversity of existing lineages, including the Miniopterus, Rhinolophus, and 

246 Scotophilus associated CoV clusters in the genus Alphacoronavirus, and the Rousettus and 

247 Rhinolophus associated CoVs clusters in the genus Betacoronavirus', and (ii) the discovery of 

248 new viruses from either a novel bat host (i.e. Triaenops) or new divergent CoV clusters in 

249 known hosts (i.e. Rhinolophus, Rousettus, Cliaerephon, etc) (Figure 3). 

250 The phylogeny suggests both ancient virus-host co-divergence and recent cross- 

251 species transmission of CoVs between bats and other mammalian hosts. The phylogeny 

252 clearly demonstrates that CoVs from two host groups, one dominated by bats and the other 

253 exclusively by non-chiropteran mammals, formed sister clades for both alpha- and beta-CoVs 

254 (Figure 3), suggestive of an ancient divergence between them. Conversely, several non- 

255 chiropteran CoVs are nested within the diversity of bat CoVs, suggesting that these viruses 

256 are relatively recent introductions from bats. These cross-species transmission events resulted 

257 in emergence of severe (SARS-CoV and MERS-CoV) and mild (HCoV-NL63 and HCoV- 

258 229E) human pathogens, as well as animal pathogens (Porcine epidemic diarrhea virus 

259 [PEDV] and Alpaca respiratory CoV). Interestingly, HCoV-NL63, previously thought to be 

260 related to North American tricolored bat ( P. subflavus ) (19), in our phylogeny is deeply 

261 nested within the newly identified CoVs from African Triaenops afer bats (Figure 3), while 

262 the P. subflavus virus (labeled green in Figure 3) grouped with a North American CoV 

263 sampled from a Myotis volans bat (Figure 3). Therefore, Triaenops afer bats likely represent 
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264 the most recent chiropteran reservoir host of viruses ancestral to HCoV-NL63. In addition, 

265 our results identified 16 additional 229E-like viruses (L14, Figure 2), providing further 

266 evidence that Hipposideros bats in Africa harbor viruses that are ancestral to HCoV-229E (5, 

267 6). 

268 

269 Host and spatial dynamics of bat CoVs in Kenya. We used Mantel’s test to compare the 

270 virus and host genetic distance matrices, as well as virus and geographic distance matrices. 

271 Notably, the correlation values were positive and highly significant in both comparisons 

272 (Table 1), suggesting that both host and geography have shaped the structure of virus genetic 

273 diversity. This conclusion remained following partial Mantel analyses and multiple linear 

274 regression analyses in which we tested the effect between two matrices while controlling for 

275 the third (Tables 1 and 2). Importantly, however, in both simple and partial Mantel analyses, 

276 the virus genetic distance matrices had much higher correlation with host genetic distance 

277 matrices than with geographic distance matrices (Table 1), indicating that bat CoV diversity 

278 is more structured by host than by geographic distance. 

279 Next, we used Mantel autocorrelograms to examine the effect of (i) geographic 

280 distance (Figure 4A) and (ii) host genetic distance on virus diversity (Figure 4B). Host 

281 genetic distance decreased from highly significantly positive at short taxonomic distances to 

282 highly significantly negative at long distances. Importantly, the crossing-over point was at a 

283 host genetic distance of around 0.15-0.19, which marks the boundary of intra- and inter- 

284 genera host diversity (Figure 4B). However, no obvious clinal patterns in geographic distance 

285 were observed within the Kenyan data set. 

286 

287 Full genome characterization and recombination analyses of NL63-like and 229E-like 

288 viruses. To further explore the evolution of the NF63-like and 229E-like viruses, we 
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289 generated the complete genome sequences of five representative bat-derived CoVs: three 

290 (BtKYNF63-9a, BtKYNL63-9b, and BtKYNL63-15) were from the NL63-like group and 

291 two (BtKY229E-l and BtKY229E-8) from the 229E-like group (L12-L14, Figure 2). For all 

292 the viruses newly described here, the genome structures follow an identical ORF 

293 arrangement: ORFlab-S-ORF4-E-M-N-ORF8 in 229E-related viruses and ORFlab-S-ORF3- 

294 E-M-N-ORFx in NF63-related viruses (Figure 5, Tables 3 and 4). The additional 

295 ORF8/ORF x was identified at the 3’ end of the genome in all bat NF63-like and 229E-like 

296 viruses characterized in this study, although it was missing in both human viruses (HCoV- 

297 229E and HCoV-NF63). The ORF8 in bat 229E-like genomes is named in analogy with the 

298 ORF8 of Ghanaian bat and dromedary 229E-like Co Vs (5, 12). The ORF8 of BtKY229E-l 

299 shared 60% protein identity with its closest relatives while BtKY229E-8 has a shorter and 

300 highly divergent ORF8. The ORFx of NF63-like viruses shared very low identity (21-33% at 

301 the amino acid level). Similarly to the bat 229E-like CoVs recently discovered in Ghana (5), 

302 the S genes in our bat 229E-like CoVs have a considerably longer 5’ S1 portion (additional 

303 185 amino acids) compared to FlCoV-229E and alpaca and dromedary 229E viruses (12). 

304 For comparison, we also included 21 genome sequences representative of the 

305 diversity in the genus Alphacoronavirus. The phylogeny based on the ORF lb protein 

306 alignment confirmed that NF63-like and 229E-like groups are monophyletic (Figure 6). 

307 Given that each group is associated with a specific bat genus, it is likely that the ORF lb 

308 genes of the human viruses (i.e. HCoV-NF63 and HCoV-229E) were ultimately derived from 

309 Triaenops- associated CoVs and Hipposideros- associ atcd CoVs, respectively. The 

310 relationship between Hipposideros bat CoVs and HCoV-229E was also demonstrated by 

311 Corman et al. (5) based on specimens obtained in Ghana. Compared to the viruses described 

312 in that study, the newly identified Kenyan viruses (BtKY229E-l and BtKY229E-8) were 

313 among those more distantly related to HCoV-229E (Figure 6 and Table 3). As for the NF63- 
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314 like group, HCoV-NL63 was nested within the diversity of three lineages of Triaenops- 

315 associated CoVs, among which BtKYNL63-9a showed the closest relationship in all genome 

316 regions with the exception of the S gene (Figure 6 and Table 3). 

317 Strikingly, the phylogeny of the S protein suggested an entirely different evolutionary 

318 history for HCoV-NL63 compared to the rest of the genome (Figure 6). Specifically, for all 

319 the proteins with the exception of S, HCoV-NL63 clustered with the NL63-like group. 

320 Flowever, in the S protein, HCoV-NL63 was deeply nested within the 229E-like group, 

321 associated exclusively with viruses from Hipposideros bats, and particularly similar to the 

322 sequences BtKY229E-l and BtKY229E-8 newly identified during this study (Figure 6). 

323 Interestingly, BtKY229E-l exhibited the closest resemblance to HCoV-NL63 in the receptor 

324 binding domain (RBD, (41)), especially in the three receptor binding motifs (RBM), whereas 

325 other viruses exhibited less similarity in these regions (Figure 7A). A phylogeny based on 

326 the RBD region confirmed our observation (Figure 7B), although it remains uncertain 

327 whether these bat viruses utilize the same host cell receptor. 

328 To further characterize this recombination event, we performed genome-scale 

329 similarity comparisons between HCoV-NL63 and related viruses (Figure 8). The analysis 

330 confirmed the chimeric nature of the HCoV-NL63 genome with only the spike protein 

331 involved in recombination via two break-points: one located near the 5’ end of the S gene and 

332 the other at around 200 nucleotides upstream of the 3’ end. To exclude the possibility of any 

333 artificial recombination, the break-point was further confirmed by RT-PCR and Sanger 

334 sequencing, using a single amplicon to cover each break-point. Collectively, these data show 

335 that HCoV-NL63 evolved from a recombination event between CoVs from the NL63-like 

336 and 229E-like groups. 

337 In addition to HCoV-NL63, we identified a number of other recombination events 

338 between divergent CoVs involving the S gene. One example is the BtKYNL63-15 newly 
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339 identified here. Throughout the genome, BtKYNL63-15 showed strong similarity (79% - 

340 99% protein identities in the ORFlab, ORF4, M, E, and N genes) with BtKYNL63-9b. In 

341 contrast, the genetic identity between S protein sequences of these viruses was only 53%. In 

342 the S protein phylogeny, BtKYNL63-15 did not cluster with NL63-like viruses but instead 

343 clustered with Miniopterus bat CoV HKU8 and Chaerophon bat CoV KY22 (Figure 6). 

344 Interestingly, HKU8 itself is a recombinant in the S gene region (Figure 6). These results 

345 suggest that the spike protein of CoVs is subject to relatively frequent recombination even 

346 between divergent viruses. 

347 

348 DISCUSSION 

349 In this study we significantly extended existing knowledge on CoV diversity, their 

350 association with specific bat species, the relatedness between bat and human Co Vs, and 

351 natural recombination events in the CoV spike (S) protein gene between viruses from 

352 different lineages. 

353 Notably, we found that host species poses a greater influence on CoV diversity in bats 

354 than the geographic distance, which can be explained by the ability of bats to fly (including 

355 long-distance migrations typical for some species) and disperse their pathogens over vast 

356 territories (42). A closer inspection of the Mantel correlogram suggests the presence of less 

357 structured (homogenous, mantel statistic r>0), and highly structured (mantel statistic r<0) 

358 diversity which, strikingly, corresponds to the division between intra-genera (10% ~ 20%) 

359 and inter-genera (> 20%) host genetic distances (Figure 4B). This suggests that within-genus 

360 virus transmissions occur significantly more frequently than between-genera transmissions, 

361 which is consistent with the previous observations that phylogenetic clustering is less 

362 constrained at the host species level than at the genus level (16, 43). While it is commonly 

363 accepted that host phylogeny constrains virus cross-species transmission to some extent (44), 
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364 the stronger demarcation at the genus level is of particular interest. In fact, bats of different 

365 species, genera, and families frequently roost together (in caves, tree holes, and other 

366 shelters), sometimes in dense aggregations, which provide abundant opportunity for 

367 mechanical transmission of pathogens between host species. Therefore, our data suggests that 

368 distinctions between bats at the genus level might mark a threshold where the differences in 

369 cellular and immunological environment become a major challenge for a virus to switch hosts. 

370 This, in turn, will lead to the pattern of ‘preferential host switching’ that has been observed in 

371 a number of other viruses (45). 

372 The detection of distinctive HCoV-NL63-like and HCoV-229E-like sequences in bats 

373 sheds new light on CoV evolution. In particular, we provide strong evidence that HCoV- 

374 NL63 has a zoonotic recombinant origin. Although the majority of the HCoV-NL63 genome 

375 originates from the viruses circulating in Triaenops afer bats, its spike protein gene is derived 

376 from a 229E-like virus circulating in Hipposideros spp. bats. However, despite the strong 

377 signal for recombination, both putative parental strains show substantial genetic distances 

378 from human CoVs. This most likely reflects extensive post-recombination sequence 

379 divergence, which in turn suggests that the recombination event has occurred prior to the 

380 emergence of HCoV-NL63 in humans. 

381 Most of the recombination events reported here involve breakpoints around the S 

382 gene. Indeed, similar breakpoints are also reported for SARS-CoV and SARS-like CoVs (24, 

383 25), HCoV-OC43 (26), and a feline CoV (46) such that it is seemingly a recombination 

384 ‘hotspot’ in many CoVs. It has been argued that a strong secondary structure between ORFla 

385 and S gene may promote transcriptional pulsing, facilitating recombination (47). However, 

386 there is also evidence that this recombination hotspot does not exist under non-selective 

387 conditions (48), such that it may reflect the successful spread of beneficial recombinants 
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388 rather than an elevated rate of recombination per se. This hypothesis is supported by the fact 

389 that the spike protein is intimately involved in the interaction with the host immune system. 

390 Importantly, our results also revealed that recombination has resulted in similar S 

391 proteins in the two human viruses HCoV-NL63 and HCoV-229E, such that acquisition of a 

392 229E-like S protein may have contributed to the emergence of NL63-like viruses in humans. 

393 However, despite this similarity of S protein sequences, these two human viruses utilize 

394 different receptors (ACE2 and aminopeptidase-N for HCoV-NL63 and HCoV-229E, 

395 respectively) to enter human cells. Within the 229E-like group, the RBD of HCoV-NL63 is 

396 more closely related to BtKY229E-8 than to HCoV-229E. The RBD of BtKY229E-8 exhibits 

397 greater similarity with that of HCoV-NL63 (Figure 7), and is therefore more likely to be the 

398 prototype of RBD in HCoV-NL63. 

399 Until recently, most reported recombination events in Co Vs involved viruses 

400 associated with closely related host species, although recombination between highly 

401 divergent CoVs has been demonstrated experimentally (49-51). The apparent lack of 

402 interspecies recombination under natural conditions is most likely due to the insufficient 

403 collection of complete genome sequences that are truly representative of coronavirus 

404 diversity. Indeed, a number of viruses, such as HKU2, display phylogenetic incongruence 

405 across different parts of the genome (52), although the lack of one of the putative parental 

406 strains has prevented clear identification of a recombinant history. 

407 Finally, our study provides insights into the evolution history of Co Vs. Although it is 

408 unclear whether bats are direct ancestors of all alpha- or beta-CoVs due to the presence of 

409 non-bat CoV clades at the basal phylogenetic positions of both genera (Figure 3), bat-borne 

410 CoVs constitute a substantial part of the diversities of alpha- or beta-CoVs. In addition, six 

411 lineages of non-bat CoVs are nested within the bat-borne clades. These likely represent 

412 independent and successful adaptations via shifts from the progenitor reservoir species (bats) 
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413 to other mammals. Four well-characterized human Co Vs lie within these clades. However, it 

414 is worth noting that bats may not have directly transmitted the viruses to humans. Indeed, 

415 HCoV-229E is more closely related to viruses circulating in camels than those in bats, 

416 suggesting that camels may be intermediate hosts between bats and humans (12). Similarly, 

417 other human CoVs such as SARS-CoV and MERS-CoV all use terrestrial mammals rather 

418 than bats as intermediate hosts, which have an increased chance of contact with humans. This 

419 underlines a typical zoonotic link of bat-associated CoV to humans via terrestrial mammals. 

420 

421 
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609 Figure Legends 

610 Figure 1. Map of Kenya showing the geographic locations of 30 bat collection sites. 

611 

612 Figure 2. Phylogeny of RdRp of all CoVs discovered in this study. The host (bat genus), 

613 number of sequences, and operational classification (lineage) are shown on the right of the 

614 tree. Branches that represent the minority host genera within the lineage defined by a single 

615 dominant host genus are marked with red and labeled with solid circle. The tree is mid-point 

616 rooted for clarity only and support values are only shown for internal branches. 

617 

618 Figure 3. Phylogenies of RdRp of alphacoronaviruses and betacoronaviruses. The trees 

619 are inferred using representative CoV sequences from this study as well as those obtained 

620 from the GenBank. The sequences are labeled with accession number/strain name, host 

621 (species) and geographic origin (three letter country code). Different colors are used to 

622 distinguish the following groups: Kenyan bat CoVs discovered during this study (orange), 

623 CoVs identified from non-bat mammals (blue), the Perimyotis subflavus virus previously 

624 reported to be related to HCoV-NL63 (green), and the remaining bat viruses (black). The 

625 lineage information for Kenyan CoVs is shown to the right of the phylogeny and matches that 

626 in Figure 2. 

627 

628 Figure 4. Mantel correlograms showing the Kenyan bat CoV RdRp sequences stratified 

629 by (A) geographic distances and (B) host genetic distances. A Mantel correlation index (r) 

630 was calculated for each of the distance classes. Under the null hypothesis of no relationship 

631 between the distance matrices, r values would be close to zero. Positive r values suggest 

632 lower genetic distances between case pairs, whereas negative r values suggest higher genetic 

633 distances between case pairs. Solid dots: r significantly different from zero; hollow dots: r not 

634 significantly different from zero. The graph (B) also shows kernel density plots for intra- 
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635 genus host distances density (grey solid line) and inter-genus host distances density (grey 

636 dotted line). The corresponding y-axis for the plot is shown on the right of the figure (B). The 

637 grey box in between the two plots marked the transition area between the intra-genus and 

638 inter-genus host genetic distances 

639 

640 Figure 5. Genome organization of 2 bat 229E-like and 3 bat NL63-like viruses sampled 

641 from Kenyan bats. A unified length scale is used for all the genomes. Within each genome, 

642 the ORFs (arrow boxes) and ribosomal frame shift sites (vertical lines) are indicated at their 

643 corresponding positions. 

644 

645 Figure 6. Phylogenetic analyses of major open reading frames of NL63-like and 229E- 

646 like Co Vs in the context of alphacorona viruses revealing evidence of recombination. 

647 Viruses sequenced in this study are shown in orange. Three potential recombinant genomes 

648 of HCoV-NL63, BtKYNL63-15, and HKU8 are indicated with red circles, blue triangles, and 

649 brown squares. 

650 

651 Figure 7. The relationships between HCoV-NL63 and related viruses at the receptor 

652 binding domain. (A) Alignment of NL63-like and 229E-like viruses and related viruses at 

653 the receptor binding domain. The positions of three receptor binding motifs (RBMs) are 

654 marked with double arrowed line. Residues in the NL63-CoV RBMs that directly contact the 

655 ACE2 receptor are marked with red downward arrows. (B) Phylogenetic relationships of 

656 NL63-like and 229E-like viruses at receptor binding domain of HCoV-NL63. The tree is 

657 based on an amino acid alignment and mid-point rooted. 

658 
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Figure 8. Recombination analyses of HCoV-NL63 using Simplot. Genome-scale similarity 
comparisons of HCoV-NL63 (query) against BtKYNL63-9a (major parental group, blue), 
BtKYNL63-9b (green), BtKY229E-8 (minor parental group, red), HCoV-229E (orange), 
BtCoV/FOlA-F2/Hip_aba/GHA/2010 (pink), and Alaca respiratory CoV (brown). A full 
genome structure, with reference to HCoV-NL63, is shown above the similarity plot, marking 
the positions and boundaries of the major open reading frames. At the beginning of the S 
gene, the flat-line followed by a sudden drop of similarity is due to a gap (deletion within 
HCoV-229E S gene) in the alignment. 


Tables 


Table 1. Results of Mantel tests and partial Mantel tests comparing two factors (host genetic 
distance and geographic distance) that predict the structure of virus genetic diversity 


Model 

r value for Kenyan bats (P value) 

Host 3 

0.5265 (P<0.0001) c 

Host 1 Geography 13 

0.5055 (P < 0.0001) c 

Geography 3 

0.2122 (P<0.0001) c 

Geography 1 Host b 

0.1285 (P = 0.0005) c 


675 a Mantel test; partial Mantel test; Significant at 0.001. 

676 

677 

678 Table 2. Multiple regression of virus genetic distance against host genetic distance and 

679 geographic distance in Kenyan bat CoVs (2007-2010) 


Variable 

Correlation coefficient 

/’-value 

Host 

7.58E-01 

1.00E-04 

Geography 

1.19E-06 

1.00E-02 


680 
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681 Table 3. Sequence comparisons of the Kenyan bat Co Vs with HCoV-229E or HCoV-NL63 



Genome 

identity 

Concatenated 

domains 

ADRP 

nsp5 

nsp!2 

nsp!3 nsp!4 nsp!5 

nsp!6 

lab 

S 

ORF3/4 

E 

M 

N 


Nucleotide % 




Amino acid % identity to HCoV-229E 






BtKY229E-l 

88 

98 

92 

98 

97 

99 97 96 

94 

95 

75 

92 

93 

90 

78 

BtKY229E-8 

88 

97 

89 

98 

98 

98 97 97 

94 

96 

74 

94 

97 

90 

68 


Nucleotide % 




Amino acid % identity to HCoV-NL63 






BtKYNL63-9a 

78 

91 

75 

89 

93 

94 89 88 

94 

86 

53 

67 

80 

82 

69 

BtKYNL63-9b 

68 

83 

51 

76 

88 

91 82 81 

84 

72 

52 

55 

64 

61 

51 

BtKYNL63-15 

68 

84 

51 

76 

88 

91 82 81 

87 

72 

49 

55 

62 

58 

52 


683 

684 


686 
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687 Table 4. Genomic features of the open reading frames (ORF) in the Kenyan bat CoVs and their putative transcription regulatory sequences (TRS). 


Virus 

229E-like virus 

NL63-like virus 

BtKY229E-l 

BtKY229E-8 

BtKYNL63-9a 

BtKYNL63-9b 

BtKYNL63-15 

Sequences (nt) 

27837 

27666 

28363 

28677 

28479 

GC% 

38 

39 

39 

43 

43 

ORF lab 
(nt) 

ORF size 
(nt) 

20286 

20304 

20277 

20349 

20355 

Putative 

TRS 

TCTCAACTAAAC(N219) 

AUG 

TCTCAACTAAAC(N219) 

AUG 

TCAACT AAAC(N214)AUG 

CTCAACT AAAC(N215)AU 

G 

TCTCAACTAAAC(N215) 
AUG 

S 

ORF size 
(nt) 

4095 

4095 

4119 

4122 

4134 

Putative 

TRS 

TCTCAACTAAATAAAA 

UG 

UCTCAACUAAA(4)AUG 

TCAACT AAAC(N 1) AUG 

CTCAACT AAAUG 

TCAACT A A AC(N 1) AUG 

ORF3/4 

ORF size 
(nt) 

681 

684 

684 

684 

684 

Putative 

TRS 

TCAACT AAAC(N37) AU 

G 

TCAACT AAAC(N37)AUG 

TCAACT AAAC(N37)AUG 

TCAACT AAAC(N37)AUG 

CAACUAAAC(N37)AUG 

E 

ORF size 
(nt) 

234 

234 

234 

234 

234 

Putative 

TRS 

TCTC AACTA AAC(N 151) 
AUG 

TCTTC AATGT AAC(N281) 
AUG 

TTATAAC(N79)AUG 

TCTGCTAAC(N 151) AUG 

TCTGAT AAC(N 151) AUG 

M 

ORF size 
(nt) 

681 

681 

693 

681 

684 

Putative 

TRS 

CT A A ACT A AAC(N4) AU 

G 

CTAAACTAAAC(N4)AUG 

CT A A AC(N 6) AU G 

TCT A AACT A A AC(N4) AU G 

UCUAAACUAAA(N4)AU 

G 

N 

ORF size 
(nt) 

1161 

1200 

1225 

1302 

1302 

Putative 

TRS 

TT AATCT A A AC(N 11)A 
UG 

ATCTAAAC(N11)AUG 

T CT A A ACT A A AC(N 3) AU G 

CT A A ACC AAAC(N4) AUG 

UCUAAACUAAAC(N4)A 

UG 

ORF8/ 

ORFx 

ORF size 
(nt) 

288 

198 

287 

291 

270 

Putative 

TRS 

UCAACUAAA AC( 1) AUG 

UCAACUAAAAC(4)AUG 

C AAAACCUAAC(N 12) AUG 

TCAACT AAAC(N567)AUG 

CAACUAAAC(N234)AUG 
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97 
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99 
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L 55, 


L4 . o 

- BtKY23 6 Rhinolophus landed L3 
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